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ABSTRACT 


With the technological development of entity extraction, relationship extraction, knowledge reasoning, 
and entity linking, the research on knowledge graph has been carried out in full swing in recent years. To 
better promote the development of knowledge graph, especially in the Chinese language and in the financial 
industry, we built a high-quality data set, named financial research report knowledge graph (FR2KG), and 
organized the automated construction of financial knowledge graph evaluation at the 2020 China Knowledge 
Graph and Semantic Computing Conference (CCKS2020). FR2KG consists of 17,799 entities, 26,798 
relationship triples, and 1,328 attribute triples covering 10 entity types, 19 relationship types, and 6 attributes. 
Participants are required to develop a constructor that will automatically construct a financial knowledge 
graph based on the FR2KG. In addition, we summarized the technologies for automatically constructing 
knowledge graphs, and introduced the methods used by the winners and the results of this evaluation. 


1. INTRODUCTION 


With the advancement of technologies such as entity extraction, relation extraction, knowledge reasoning, 
and entity linking, the research into knowledge graph has been carried out in full swing in recent years. 
However, to date, there are relatively insufficient technologies for systematically and automatically 
constructing knowledge graphs. Moreover, the ability to automatically construct knowledge graphs 
determines the popularity of the application of knowledge graph technologies. To promote the further 
development of knowledge graph technologies, a good data set and an evaluation system are required. In 
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this regard, ImageNet [1] is a classic example that has significantly promoted the development of computer 
vision. Similarly, in the field of knowledge graphs, the Text Analysis Conference (TAC) [2] released multiple 
data sets and organized corresponding evaluations to continuously promote the development of knowledge 
graphs. However, there is no similar data set or evaluation in the Chinese language. 


One of the most widely used domain-specific knowledge graphs is financial knowledge graph, which is 
widely used in investment research, risk tracking and control, corporate public opinion management, 
intelligent question answering, and industrial analysis. In the financial field, constructing financial knowledge 
graphs from various unstructured text is a basic task of great value. However, there are insufficient 
studies [3, 4] that have constructed financial knowledge graphs. In particular, there is no data set for the 
automated construction of financial knowledge graphs and the corresponding evaluation. 


To better promote the development of knowledge graphs, especially in Chinese and in the financial 
industry, we organized the automated construction of financial knowledge graph evaluation at the 2020 
China Knowledge Graph and Semantic Computing Conference (CCKS2020). The data source for the 
evaluation was Chinese financial research reports on macroeconomics, industries, and companies conducted 
by professional financial institutions. The reports are characterized by comprehensiveness, high reliability, 
in-depth, and high quality. The scope of the content encompasses financial indicators, policies, and rich 
data, which are highly suitable for building a financial knowledge graph for financial institutions, 
governments, research institutes, etc., to provide an in-depth analysis and intelligent decision-making 
support. However, because the data and knowledge of financial research reports cover a wide range of 
areas and contain professional knowledge, different people express the same content differently, which 
makes it quite difficult to construct a knowledge graph from financial research reports. Solving these 
problems can greatly boost the development of automated construction of knowledge graphs, and have 
great academic value. We collected 1,200 financial research reports, and designed a knowledge graph 
schema with 10 entity types, 19 relationship types, and 6 attributes. In addition, we annotated 17,799 
entities, 26,798 relationship triples <entity, relationship, entity>, and 1,328 attribute triples <entity, attribute 
key, attribute value> by experts in the financial industry based on the knowledge graph schema, which is 
the largest automated knowledge graph construction and evaluation data set that has been published in 
Chinese. Evaluation refers to the TAC 2016 Cold Start KBP Track plan [5], starting with a predefined 
knowledge graph schema and a seed knowledge graph, and automatically extracting entities, relationship 
triples, and attribute triples from the unstructured texts of financial research reports. The evaluation does 
not limit the algorithms and models used. Participants can use open pre-training models, such as BERT [6] 
and ERINE [7], and use third-party open knowledge graphs, such as those from OpenKG®. Simultaneously, 
the participants are encouraged to use various methods, such as unsupervised, weak supervision, and 
distant supervision, to realize the automated construction of knowledge graphs. 


The remainder of this paper is organized as follows. Section 2 presents the evaluation tasks and methods 
with the release of a professional Financial Research Report Knowledge Graph (FR2KG) data set. The 
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properties of the FR2KG data set are described in Section 3. Section 4 surveys the current knowledge graph 
construction technologies, and Section 5 introduces the methods used by the winners and the results of 
this evaluation. Finally, the challenges and prospects for the automated construction of domain knowledge 
graphs are discussed in Section 6. 


2. TASK DEFINITION AND EVALUATION METRICS 
2.1 Task Definition 


The content of this evaluation is constructing a financial knowledge graph from the text of unstructured 
financial research reports, which is based on the given knowledge graph schema: 


e Given: unstructured text of the financial research reports 

e Given: the schema of knowledge graph 

e Given: seed knowledge graph 

e Participants are required to develop a constructor that extracts entities, attribute triples, and relationship 
triples that conform to the schema from the unstructured text provided. 


There are 1,200 financial research reports. After removing tables, images, headers, footers, and other 
useless and repetitive information, the remaining contents are converted into plain text format. Simultaneously, 
we worked with financial research experts to analyze these research reports, and designed a schema of the 
financial knowledge graph based on the characteristics of financial research and technical evaluation. Next, 
the unstructured text was annotated by trained annotators, and the results were reviewed by financial 
research experts. Therefore, the annotated knowledge graph (annotated KG) is composed of a data set that 
has been reviewed. The annotated KG will be randomly divided into seed knowledge graph (seed KG) and 
evaluation knowledge graph (evaluation KG). The random segmentation method was as follows: 


1) Randomly select 200 copies of 1,200 TXT files; 

2) Select the extracted entities, relationship triples, and attribute triples, corresponding to 200 TXT files 
as seed KG; and 

3) Remove all the data in the seed KG from the annotated KG, and use the remaining data as the 
evaluation KG. 


The above is a complete description of the data processing and annotation process. Therefore, we have 
obtained the entire FR2KG data set, including knowledge graph schema, unstructured text financial research 
reports, seed KG, and evaluation KG. The goal of this evaluation is to use FR2KG to develop a financial 
knowledge graph constructor to automatically extract entities, attribute triples, and relationship triples from 
unstructured text. The constructed financial knowledge graph excludes the data that already exist in the 
seed KG, and the evaluation procedure uses the metrics described in the next section to measure the quality 
of the knowledge graph. The entire process mentioned above is shown in Figure 1. 
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Figure 1. Schematic of the task process. 


Given the FR2KG data set, the goal of the participants is to develop a financial knowledge graph 
constructor that is the most efficient at automatically extracting entities, attribute triples, and relationship 
triples from unstructured text of financial research reports, and constructing a financial knowledge graph 
that is as consistent as possible with the knowledge graph annotated by experts. To be as close to the real 
application scenario as possible, and considering the fairness and reasonableness of all participants, this 
evaluation allows all participants to use various open or public data, including, but not limited to, pre- 
trained models, open knowledge graphs from OpenKG, and other sources. If participants would like to use 
private data, the data must be publicly available, and other participants should be able to use them. 


2.2 Evaluation Metrics 


This evaluation task uses the F1 score defined as follows to evaluate the performance of the knowledge 
graph constructor. The higher the F1 score, the better is the performance. The data of the knowledge graph 
are divided into three types: entity, attribute triples <entity, attribute key, attribute value>, and relationship 
triples <entity, relationship, entity>. Precision (p), recall (r), and F1 score (F1) are defined as follows. First, 
we define the following variables: 


ye: The set of all the pairs of <entity, entity type> extracted by the constructor; |y¢| represents the number 
of entities. 
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y,: The set of all pairs of <entity, entity type> annotated by experts; | 7, | represents the number of entities. 


ya: The set of all attribute triples <entity, attribute key, attribute value> extracted by the constructor; |ya| 
represents the number of triples. 


¥,: The set of all attribute triples <entity, attribute, attribute type> annotated by experts; | y, | represents 
the number of triples. 


ye: The set of all relation triples <entity, relationship, entity> extracted by the constructor; |yg| represents 
the number of triples. 


Ve: The set of all relation triples <entity, relation, entity> annotated by the expert; | %,| represents the 
number of triples. 


yy: The intersection of y and ¥, that is, the same part extracted by the constructor as the expert 
annotated, which represents the entities, attribute triples, or relationship triples correctly extracted by the 
constructor. y represents the number. 


Thus, we have the following: 


Entities: 


Attribute triples: 
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Relationship triples: 


np 
Pr = ys yal (7) 
[Yel 
i -YN P (8) 
IP. 
Fip = 2x PRR (9) 
i Pet he 


Finally, we define the final evaluation F1-score of the entire knowledge graph as in Equation (10): 


_ Fl, +2 F1, + 2x Fly 
7 5 


F1 (10) 


We use the weighted average of the F1-score of three types of entity, attribute triple, and relationship 
triple, in which the weights from attribute triples and relation triples are twice that of the entities, because 
we believe extracting attribute triples and relation triples is twice as difficult as extracting entities. For an 
attribute triple, both the entity and attribute value must be extracted correctly, and the attribute value should 
match the attribute key, which is the same as identifying the entity and matching the entity type. So, we 
decide that the weight of attribute extraction is twice that of entity extraction. For a relation triple, it is 
necessary to extract two entities correctly and match the corresponding relationship type. It is a significant 
topic to study the difficulty of entity extraction, attribute extraction and relationship extraction in detail. In 
determining the evaluation metrics, we carried out a survey and did not find the corresponding research. 
So, we decided the weight of 2 based on our experience. 


3. PROPERTIES OF FR2KG 
3.1 FR2KG Overview 


Among data sets that have been constructed, some are manually annotated [8, 9], some are collaboratively 
annotated by humans and algorithms and others are labeled with higher precision through better algorithms. 
However, most of the data sets are concentrated, in general, with news and common-sense articles 
(such as Wikipedia), as well as in some domains, such as biomedical and medical-related and scientific 
and technological literature data sets. Data sets for financial knowledge graphs are rare, and Chinese 
financial knowledge graph data sets are even rarer. For the first time in this evaluation, the FR2KG data set 
and the corresponding unstructured texts are published, aiming to promote the development of technologies 
for distant supervision or weak supervision, to automatically construct domain knowledge graphs. 


The construction process of the FR2KG data set is described in the previous section, as shown in 
Figure 1. First, 1,200 financial research reports were collected. Experts in the financial field analyzed these 
reports, extracted the plain text from the main body, and saved it in the TXT format as the basic unstructured 


Data Intelligence 423 


202211.00387v1 


chinaXiv: 


ChinaxivA (ERAT! 


Data Set and Evaluation of Automated Construction of Financial Knowledge Graph 


text corpus. Then, the experts and the knowledge graph team studied these corpora together, designed the 
schema of the knowledge graph from the perspective of financial business, and performed iterative 
optimization according to the characteristics of the evaluation, and finally, determined that it contained 10 
entity types, 6 entity attributes, and 19 relationships between the entities. Subsequently, these corpora were 
annotated with the help of the annotation system of the Yuanhai Knowledge Graph Platform, which is a 
product of DataGrand Inc. The annotation system is specifically used for the annotation of the knowledge 
graph, and supports the annotation of entities, entity attributes, and relationships between entities. Before 
annotating, all annotators were trained by financial experts to align their understanding of the schema. All 
annotated data were reviewed by experts, and then, divided into seed KG and evaluation KG, as described 
in the previous section. Examples of the FR2KG data set are shown in Figures 2 and 3. 


e 8 
TO» 
r A MA ECAT. 


pae e eS À a AV 
Ooa Oa 
] ù 


©: oa IESS 
Le a 


2 iD: EnS) 4 
n O EAAS BNS 
: ó 
FME AR 
Eib ; 
-frys >` 2 r iiaa 


P 
-@ 


Wem Pi sunnanne ae 
A 


we 


ay 


8 
ino, xemean, OEE 
Figure 2. An example of the FR2KG in Yuanhai Knowledge Graph Platform. 
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Figure 3. A simplified example of FR2KG. 
A summary of FR2KG is shown in Table 1. It is currently the largest data set for the automatic construction 
of Chinese financial knowledge graphs. Table 1 describes the data, and the following sections introduce 


FR2KG in detail. 


Table 1. Summarization of FR2KG. 


Entities number Relationship triples number Attribute triples number 
Seed KG 5,131 6,091 354 
Evaluation KG 12,668 20,707 974 


3.2 Financial Research Reports 


The length of the financial research reports varies. As shown in Table 2, the longest text has 13,857 
characters, while the shortest has only 242 characters, and the longest is close to 60 times the shortest. 
However, the length of most texts is concentrated in the range of 1,000-3,000 fields, accounting for 70% 
of the total text. In terms of paragraphs, the shortest has 4 paragraphs, and the longest has 74 paragraphs. 
Most reports were between 10 and 30 paragraphs, accounting for 82% of the total text. 
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Table 2. Statistics of 1,200 financial research report texts in FR2KG. 


By characters By paragraphs 
Mean 2,214 19.9 
Standard Deviation 1,173 9.5 
Minimum 242 4 
First Quartile 1,392 14 
Median 2,018 18 
Third Quartile 2,779 24 
Maximum 13,857 74 


3.3 Schema 


Figure 4 shows the schema of FR2KG. There are 10 entity types in total, which are represented by ellipses, 
and 19 relationships between entity types, which are represented by directed arrows. For example, in the 
relationship of <A W/person, #/investment, #L*4J/organization>, the directed arrow in the figure points 
from "person" to "organization". Among these entity types, the three entity types have attributes (Table 3). 
Notably, the attribute value of the time type is normalized to the "YYYY-mm-dd" format during annotation. 
The participants were also required to normalize the time data in the construction of the knowledge graph. 


Indicator Person 


PR 


invest 


FR 
provide 


Product 


belong to 


Figure 4. The knowledge graph schema of FR2KG. 
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Table 3. Attributes of entity types of FR2KG schema. 


Entity type Attribute key Data type of attribute value 
WEF/Research Report KAGE E/Publish Date Date 
YFK /Rating String 
EYF /Previous Rating String 
WLK/Organization 42 4"/Full Name String 
HEM 4H /English Name String 
MH /Article ACA IM e]/Publish Date Date 


The FR2KG schema, as shown in Figure 4, is very rich in applications that can be used in investment 
research, financial risk assessment and control, product analysis, industrial chain analysis, and other fields. 
For example, with the relationships of <A W/person, #¢%/invest, #L*4/organization> and <#L#/organization, 
#E/invest, HL/organization>, it can be used for in-depth investment and financing analysis. Another 
example is the relationship between <#L#4/organization, “27"#448/sale, 7*4h/product> and <H 
organization, A/WIZA/buy, 7”m/product>, which can be used for supply chain analysis, mining the 


company’s advantages in the supply chain and assessing supply chain risks. 


3.4 Entities and Attributes 
Tables 4 and 5 summarize the entities and their attribute triples of FR2KG. 


Table 4. Statistics of entities of FR2KG. 


Entity Type Number of seed KG Number of evaluation KG 
A4)/Person 86 383 
47 v/ndustry 552 1,253 
\L4%/Service 664 1,053 
7” in/Product 1,218 3,366 
Wr4k/Research Report 73 307 
WLR/Organization 1,742 3,739 
PAU Sz/Risk 243 826 
X /Article 170 450 
ditbx/Indicator 239 856 
1 }4/Brand 144 435 

51,31 12,668 


Table 5. Statistics of attribute triples of FR2KG. 


Entity type Attribute key Number of seed KG Number of evaluation KG 

WH4R/Research Report KARIN H/Publish Date 70 284 
YEK /Rating 63 216 

EWF% Previous Rating 46 155 

HLH/Organization &KKFull Name 90 35 
XK /English Name 31 111 

X/Article AeA ib} []/Publish Date 54 173 
354 974 
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3.5 Relationships 


Table 6 summarizes the statistics of relationship triples in the FR2KG. 


Table 6. Statistics of relationship triples of FR2KG. 


Head entity type Relationship Tail entity type a A E a 


of seed KG of evaluation KG 

P= m/Product R H/Use mhý/Brand 100 189 
WLR/Organization KH/Use 47 ML/ndustry 540 2,081 
NL#/Organization Ht /Invest HL/Organization 88 239 
WUR/Organization 4A/Has Hn h4!/Brand 85 331 
HL#J/Organization Sk) T/Affiliated with #L#/Organization 120 401 
fT \v/ndustry skjá F/Subordinate of 47\k/Industry 109 485 
WLK/Organization 2 Fi/Is customer HL#S/Organization 159 243 
HLR/Organization H IH/Merge WLRK/Organization 65 196 
HL#4/Organization Aii/Publish SCHt/Article 111 287 
A4Agi/Person HY% /Invest HL4S/Organization 7 51 
A4/Person {LEAF Work for WUK/Organization 54 259 
WLK/Organization JE/Provide WL /Service 563 1,235 
WLR/Organization RIGS A/Buy r” fn/Product 45 75 
WUR/Organization “EPA FE /Sale P“ f/Product 758 2,232 
HLR/Organization Hil /Has JX Me/Risk 171 422 
47 \V/Industry HA/Has K/Risk 68 431 
WH4R/Research Report XH /Use tHtn/Indicator 411 1,736 
WFFR/Research Report 4 A¢/Relate to 47 \v/Industry 748 1,658 
WFR/Research Report 4 A¢/Relate to BL4#S/Organization 1,889 8,156 

6,091 20,707 


3.6 Properties of FR2KG 


As a complete domain knowledge graph data set, FR2KG is currently the largest Chinese financial 
knowledge graph data set dedicated to the automated construction of knowledge graphs. In the future, we 
plan to continue expanding and enriching its content. 


Scale: FR2KG is committed to promoting the development of automated construction of domain 
knowledge graphs, including rich and diverse data types and the largest data scale currently. In addition, 
the content is abundant, including common stock research reports, industry research reports, and 
macroeconomic research reports in the financial industry. 


Relationship: The data set provides 19 common relationships in the financial field, as shown in Figure 
4. These relationships can help realize multiple and diverse analyses in the financial industry. For example, 
through the relationship of <#L#4J/organization, 4l#i/has, )AUi/risk> and <4¥\b/industry, Hil#i/has, Jx\r/ 
risk>, industry or enterprise risk analysis, risk assessment, and risk early warning could be performed better. 
The relationship between <A #/person, {£H4-F/work for, HL#4/organization>, <A W/person, #£%/invest, 
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NLeK/organization>, and <#L4/organization, #¥/invest, Hl#4/organization> can apply to deep-level 
equity relationship mining, which has great value in bank loans and investment analysis. 


Professionalism: The schema of FR2KG is jointly designed by experts in the financial industry and 
knowledge graph experts. The annotators were trained by financial experts before annotation, and the 
results were reviewed by financial experts to ensure the high professionalism of the data set. 


Diversity: The goal of FR2KG is to evaluate the performance of the automated construction of financial 
knowledge graphs; however, the application of the data set is not limited to this objective. Various other 
technologies related to the knowledge graphs can also be evaluated using FR2KG. For example, common 
tasks, such as link prediction and node classification in graph neural networks, tasks related to various 
graph algorithms, and tasks based on deep learning techniques to implement traditional graph algorithms, 
can be evaluated. 


4. RECENT TRENDS IN KNOWLEDGE GRAPH AUTOMATION CONSTRUCTION 
4.1 Entity Extraction 


Entity extraction, also known as named entity recognition (NER), aims to recognize the mentions of rigid 
designators from text belonging to predefined semantic types such as person, location, and organization. 
The two popular data sets from recent work, CoNLLO3 [14] and OntoNotes5.0 CoNLLO3 contain annotations 
for Reuters’ news in English and German. The English data set contains a large portion of sports news with 
annotations in four entity types: person, location, organization, and miscellaneous entities. OntoNotes5.0 
contains annotations for a large corpus, comprising various genres with structural information and shallow 
semantics. The data set was annotated using 18 entity types. BOSON [15], People’s Daily [16], and 
MSRA [17] are Chinese entity extraction data sets in general fields, while FR2KG proposed in this article 
focuses on the Chinese financial field. 


In recent years, research on supervised entity extraction has mainly been focused on how to input 
representation and design neural models, including context encoders and tag decoders. In addition, 
unsupervised and semi-supervised entity extraction has achieved remarkable development. 


4.1.1 Supervised Entity Extraction 


Input representation is the first step in the entity extraction. In this subsection, we summarize word-level 
representation, character-level representation, language model, and other representations. Since Mikolov 
et al. [18] proposed word2vec, many studies on entity extraction have used the word2vec toolkit to 
train word-level representation on different corpora, such as PubMed [19], Gigaword [20], NYT [21], and 
SENNA [22]. In addition, GloVe [23] and FastText [24] are widely used. Instead of only considering word- 
level representations, character-level representation has been found to be useful for exploiting explicit 
sub-word-level information and naturally handling out-of-vocabulary information [21, 25, 26]. Both word- 
level and character-level representations only contain the meaning of the word, without its context. 
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Therefore, many studies have added context-dependent language model representations to the input 
representation. Peters et al. [27] proposed TagLM, a language model augmented sequence tagger. This tagger 
considers both pre-trained word embeddings and bidirectional language model embeddings for each token 
in the input sequence. Based on TagLM, Peters et al. [26] proposed the famous pre-trained bidirectional 
language model ELMo. The key difference between ELMo and TagLM is that ELMo allows the task model 
to learn a weighted average of all bidirectional LM layers, whereas TagLM only uses the top bidirectional 
LM layer. In contrast to CNNs and recurrent neural networks (RNNs), transformers [28] utilize stacked self- 
attention and point-wise, fully connected layers to build basic blocks for the encoder and decoder. Based 
on the transformer, BERT [6] was proposed to pre-train a deep bidirectional transformer by jointly conditioning 
both the left and right contexts in all layers. Combining pre-trained language model embedding with 
traditional embedding has become a de facto standard [29, 30, 31, 32, 33]. In addition, novel input 
representations are still being explored, such as external knowledge from Wikidata [30], dependency 
trees [31], and global contextual embedding [32, 33]. 


After converting the input sentence into a representation, the context encoder captures the context 
dependencies, and the tag decoder predicts tags for tokens in the input sequence. Collobert et al. [34] used 
a CNN to produce local features around each word, and applied a maximum or averaging operation to 
extract global features. Strubell et al. [35] proposed an iterated dilated CNN (ID-CNN), where four stacked 
dilated convolutions having a width of three obtained more contextual information. Compared to CNN, 
the bidirectional RNN makes full use of the forward and backward information in the sentence, which can 
effectively extract the features of the entire sentence. Therefore, a bidirectional RNN is the most popular 
encoder for entity extraction tasks. Although we can directly use the hidden layer of the bidirectional RNN 
to connect to the softmax layer, adding the CRF layer as the tag decoder can help in understanding the 
limitations of the sentence, to ensure the effectiveness of the prediction. Huang et al. [22] were the first to 
utilize the BiLSTM-CRF architecture to implement sequence tagging tasks, including POS, chunking, and 
NER. Similar to [22], many studies also used BiLSTM as an encoder and CRF as a decoder [21, 36, 37]. In 
addition to CRF, RNN [38] and pointer network (PtrNet) [39] have also been explored as tag decoders. 
Shen et al. [40] reported that RNN tag decoders outperform CRF and are faster to train when the number 
of entity types is large. However, a major disadvantage of RNN and PtrNet decoders lies in greedy decoding, 
meaning that the input of the current step requires the output of the previous step. Because the pre-trained 
model can capture sufficient semantic information, some studies only use BERT and completely abandon 
BiLSTM-CRF. In particular, Li et al. [41] framed the NER task as a machine reading comprehension (MRC) 
problem, which can be solved by fine-tuning the BERT model. 


4.1.2 Semi-supervised Entity Extraction 


The semi-supervised entity extraction method aims to manually add a small number of appropriate 
entities as training corpus according to the entity type designed in advance by humans, and use the pattern 
learning method for continuous iterative learning and manual adjustments to finally generate a named entity 
data set, which reduces the dependence on manual annotation corpus. Liu et al. [42] used a small amount 
of existing labeled data to train the initial KNN and CRF models, performed semi-supervised learning on 
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tweet data, and improved the training data by learning and supplementing data with a large amount of 
unlabeled text. Etzioni et al. [43] proposed the KNOWITALL system, based on a set of predicate inputs, 
using pattern learning, subclass extraction, list extraction (uppercase), and other modes to perform NER on 
unlabeled data. In addition, Zhang and Elhadad [44] proposed a method for extracting named entities from 
biomedical text based on terminology, corpus statistics (such as inverse document frequency and context 
vector), and shallow syntactic knowledge (such as noun phrase chunks), and conducted experiments on 
two mainstream biomedical data sets to verify the method. 


4.1.3 Unsupervised Entity Extraction Methods 


Based on the vocabulary resources, vocabulary patterns, and statistical data that are calculated on a large 
corpus, named entities can be inferred by clustering and combining the similarity in the sentence context. 
Nadeau et al. [45] proposed an unsupervised system for the construction of geographical name dictionaries 
and the resolution of named entity ambiguities. Zhang and Elhadad [44] proposed an unsupervised method 
that uses terminology, corpus statistics and shallow grammatical knowledge to extract named entities from 
biomedical texts, and proved the effectiveness and versatility of the unsupervised method. Brooke et al. [46] 
performed Brown clustering based on pre-segmented expectations, combined with the rank value of each 
class after clustering, and constructed bootstrap seeds for training, which can extract entities for specific 
domain knowledge. Jia et al. [47] used cross-domain language modeling and obtained task and domain 
vectors to complete NER entity extraction in unsupervised and supervised fields, respectively. Collins and 
Singer [48] only used seven simple “seed” rules to realize NER on the original data, and proposed two 
unsupervised named entity classification algorithms. 


4.2 Relation Extraction 


Relation extraction is usually considered to be a classification task, which predicts semantic relationships 
between pairs of nominals and can be defined as follows. Given sentence S with annotated pairs of 
nominals e, and e, we aim to identify the relationships between e, and e,. Relation extraction is usually 
divided into supervised, unsupervised, and distant supervision relation extraction. End-to-end entities and 
relation extraction are also popular. Supervised data sets are of high quality and contain almost no noise, 
but are often small. SemEval2010 Task 8 [49] contained nine directed relation types and 10,717 samples, 
of which 8,000 were used for training and 2,717 for testing. ACE2005 contains 599 documents, which are 
related to news and e-mail and divided into seven main types of relations. Each type of relationship has 
an average of 700 instances for training and testing. In addition to ACE2005, which contains the Chinese 
corpus, DulE [50] is another large-scale Chinese data set for information extraction. The FR2KG proposed 
in this study focuses on relation extraction in the Chinese financial field. For distant supervision relation 
extraction, the New York Time (NYT) data set is formed by aligning the relation with Freebase. The data set 
contains 52 possible relationship categories and a special relationship category NA (indicating that there 
is no relation between entities). The training data contain 522,611 sentences, 281,270 entity pairs, and 
18,252 relations. 
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4.2.1 Supervised Relation Extraction 


Zeng et al. [51] used a CNN to extract vocabulary and sentence-level features for relation extraction 
tasks. The lexical-level feature vector is concatenated by the word vector of the labeled entity as well as 
the context and semantic category feature in WordNet. The sentence-level feature representation was 
automatically extracted using the maximum pooling CNN. In order to eliminate the impact of artificial 
class, Santos et al. [52] used a pairwise ranking loss function for training instead of cross entropy. Because 
there is a lot of irrelevant information in the sentence, the method of extracting sequence features cannot 
accurately predict the relationship between the two entities. Therefore, Xu et al. [53] noted that the shortest 
dependency path (SDP) is beneficial for determining the relationship between two entities. Specifically, [53] 
successively took the SDP from the subject to the object as input, passed it through the lookup table layer, 
produced local features around each node on the dependency path, and combined these features into a 
global feature vector through a CNN that was then fed to a softmax classifier. Similarly, Xu et al. [54] used 
a four-channel LSTM to extract words, parts of speech, grammatical relations, and WordNet semantic 
features along with the SDP. However, previous studies based on SDP may neglect crucial information. 
Zhang et al. [55] encoded a complete dependency structure over an input sentence with an efficient 
graph convolutional network (GCN), and then, extracted entity-centric representations to make robust 
relation predictions. To avoid the introduction of irrelevant information between entities in the complete 
dependency tree, Guo et al. [56] proposed AGGCN to automatically generate the substructures for relation 
extraction tasks. 


4.2.2 Distant Supervision Relation Extraction 


Supervised relationship extraction requires a large amount of expert-labeled data, which limits the 
application of this method. Therefore, Mintz et al. [57] proposed the hypothesis of remote supervision as 
follows. If two entities have a relationship in a known knowledge base, then all sentences that mention 
these two entities will express that relationship in some way. They then applied this assumption to align 
the document with the existing database and automatically generate a large amount of training data. Zeng 
et al. [58] proposed a piecewise convolutional neural network (PCNN) to extract features, and used a 
multi-instance learning method to alleviate the data noise problem. However, they failed to fully utilize the 
information across different sentences, and ignored the fact that there can be multiple relationships between 
the same entity pair. Therefore, after using PCNN to extract the features of each sentence in the package, 
Jiang et al. [59] used cross-sentence maximum pooling to select the features of different sentences, and 
then, aggregated the most important features into the representation of each entity pair. Finally, the feature 
is applied as a sigmoid instead of softmax to judge the possibility of multiple labels. Since different sentences 
have different contributions, [60, 61] focused on how to use the attention mechanism to select sentences. 
Inspired by the transE model [62], for the two entities e, and e, of each package, e, — e, is used to represent 
the relation between the two entities. The features extracted by PCNN and relation e, — e, are concatenated 
to obtain the weight of each sentence, and the feature of the package is the weighted sum of all sentence 
feature vectors. Du et al. [63] proposed a new multi-layer structured self-attention model based on BiLSTM. 
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Among them, the word-level attention mechanism based on a two-dimensional matrix can focus on different 
aspects of a sentence to better learn the contextual representation. The two-dimensional sentence-level 
attention mechanism used for multi-example learning can focus on different effective examples to better 
select sentences. Many studies use existing knowledge bases to add information to alleviate the problem 
of mislabeling in remote supervision. Vashishth et al. [64] added an additional entity type and relationship 
as information from the knowledge base (KB) to improve prediction performance. In addition, Wang 
et al. [65] proposed a label-free distant supervision method that does not use the relation labels under this 
inadequate assumption, but only uses the prior knowledge derived from the KB to supervise the learning 
of the classifier directly and softly. 


4.2.3 Unsupervised Relation Extraction 


The unsupervised learning method assumes that the entity pairs with the same semantic relationship have 
similar context information, and the corresponding context information of each entity pair can be used to 
represent the semantic relationship of the entity pair. Hasegawa et al. [66] clustered entity pairs with the 
same contextual semantics, and then selected a core vocabulary to mark the semantic relationship between 
the categories. [67] improved Hasegawa’s hypothesis by eliminating candidate entity pairs with multiple 
relationships or performing multi-level clustering to extract relationships. Davidov et al. [68] used Google 
search as the knowledge background to define concept words; however, without pre-defining any relationship 
types in advance, they can automatically extract related entities and semantic relationships. Yan et al. [69] 
combined dependency features and shallow grammatical templates, and used clustering methods to extract 
all the semantic relationships of entities in Wikipedia entries from a large-scale corpus. In addition, Bollegala 
et al. [70] analyzed the templates after clustering, found the implicit semantic relationship between the 
entity pairs, and selected suitable extraction templates from the candidate relationship templates, which 
expanded the scope of entity relationships and improved the accuracy and recall rate to a certain extent. 


4.2.4 End-to-end Entity and Relation Extraction 


Entity relation extraction can be pipelined for NER and relation classification. This independent framework 
is more flexible, but ignores the correlation between the two tasks. The result of entity recognition may 
affect the performance of the relationship classification and lead to incorrect transmission. In contrast to 
the pipeline method, the joint learning framework can use a single model to extract entities and relationships, 
and can effectively integrate information regarding entities and relations. 


An end-to-end approach is to share the model parameters between the entity recognition task and 
relationship classification task. Miwa et al. [71] proposed an end-to-end model that captures both word 
sequence and dependency tree substructure information by stacking tree-structured LSTM on BiLSTM. 
Zheng et al. [21] designed novel tags that contain information regarding entities and the relationships they 
hold. Based on this tagging scheme, the joint extraction of entities and relations can be transformed into a 
tagging problem. However, it is difficult to solve the problem of overlapping triples of different relations in 
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sentences. Zeng et al. [72] divided the sentences into three types according to the triplet overlap degree: 
Normal, EntityPairOverlap, and SingleEntiyOverlap. They proposed an end-to-end model based on 
sequence-to-sequence learning using a copy mechanism. The encoder converts a natural language sentence 
into a fixed-length semantic vector, and then the decoder reads in this vector and generates multiple triplets. 
To better consider the interaction of different relations in sentences, especially overlapping relations, 
Fu et al. [73] proposed an end-to-end model, GraphRel. In the first stage, GraphRel learns to automatically 
extract hidden features for each word by stacking a BiLSTM sentence encoder and a GCN dependency tree 
encoder to tag entity mention words and predict relation triplets. In the second stage, GraphRel uses a 
novel relation-weighted GCN to better predict the interaction between triples. 


The advantage of the shared model parameters is that there is no need to attach constraints to the two 
subtasks; however, the independent sub-models do not allow the relationship between the two subtasks to 
be fully utilized. Therefore, studies must be conducted for achieving global optimization of joint extraction. 
Based on sharing model parameters, Sun et al. [74] proposed a global loss function to explore the mutual 
influence of the entity and relational models. Most existing methods determine relation types only after all 
entities have been recognized so the interaction between relation types and entity mentions is not fully 
modeled. Takanobu et al. [75] applied a hierarchical reinforcement learning framework to enhance the 
interaction between entity mentions and relation types. The high-level process detects the relationship 
indicator at a specific location. If a relationship is determined, a low-level process is triggered to identify 
the entity corresponding to the relationship. When low-level tasks are completed, the high-level reinforcement 
learning process continues to search for the next relationship in the sentence. Li et al. [76] transformed the 
entity relationship extraction task into multiple rounds of questions and answers; that is, the entity and 
relationship extraction is transformed into a task of determining the answer from the context. This method 
provides a better way to capture the label hierarchy dependency. However, this intermediate method is 
computationally inefficient because it needs to scan all entity template questions and related relationship 
template questions in a single sentence. 


5. PARTICIPANTS OVERVIEW AND EVALUATION RESULTS 


A total of 740 teams participated in the evaluation. In the top 18 teams, three teams were from companies, 
10 teams were from universities, three teams were a combination of universities and companies, and the 
other two teams did not disclose relevant information. Table 7 presents a summary of prize-winning teams. 
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Table 7. Summary of top five teams. 


Rank Team Affiliation F1 Score 
1 UPSIDE-DOWN State Grid Information & Telecommunication Group Co., Ltd. 0.49704 
2 Solaris99 Peking University 0.48340 
3 BOOMBOOM Shanghai Jiao Tong University 0.46455 


Beijing Yuannian Technology Co., Ltd. 

The University of Edinburgh 
4 SGIT Fujian YIRONG Information Technology Co., Ltd. 0.45376 
5 Iceburg Beijing University of Posts and Telecommunications 0.41169 


The top five teams in this evaluation have sorted out and submitted a brief description of the methods 
they used. These methods and descriptions are analyzed and summarized below. 


e All teams used rule-based methods or labeling functions to produce a training corpus. Only one team 
manually labeled 20 research reports as supplementary and validation samples, in addition to the 
automatically generated samples. 

e All teams used BERT-based models in entity extraction; in addition to models, rule-based methods 
are used to supplement specific entity types. One team used the BERT-softmax model, three teams 
used the BERT-CRF model architecture, and the other team used the BERT-MRC [38] architecture. 

e In terms of relationship and attribute extraction, all teams used a method based on co-occurrence. 
Co-occurrence is the basic assumption of distant supervision; that is, when two entities appear 
together in a short text, it can be assumed that they have a corresponding relationship. Based on the 
assumption of co-occurrence, the three teams used rule-based methods to determine whether this 
relationship existed, and the other two teams used BERT-based models to classify the relationships. 

e A team used a clustering method to cluster research reports on similar or the same topics. 


Summarizing the methods used by these teams, in the tasks of entity, relationship, and attribute extraction 
of knowledge graphs, the method based on the BERT pre-training model is still the best and most popular 
currently; it is also widely used. Because this evaluation is very close to the real application scenarios of 
the industry, in addition to using the BERT-based model, the rule-based method is still very effective in 
some cases and is an effective complement to the algorithms. 


6. CHALLENGES AND LOOKING AHEAD 


From the results of this evaluation, the highest F1 value is approximately 0.5, when automatically 
constructing a financial knowledge graph based on the predefined schema, which is far from the requirements 
of real applications. This sets out some more challenging topics and new directions for research in knowledge 
graphs. 
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e In the field of automatically constructing knowledge graphs with a given schema and seed knowledge 
graph, the existing methods are not very effective. Developing end-to-end methods or multi-step 
frameworks to automate the construction of knowledge graphs is still a difficult task. 

e Given a knowledge graph with schema, it is worthwhile to determine how to automatically annotate 
training data for entity, attribute, and relationship extraction. In addition, with the automatically 
annotated training data, building an excellent model to construct a high-precision and high-recall 
knowledge graph is still a challenge. 

e In terms of entity extraction, the prize-winning participants in the evaluation used the BERT-based 
model with the rule-based method. Further research should be conducted about the end-to-end 
model and unified framework for this real scenario. 

e Relationship and attribute extraction is currently focused on the use of co-occurrence with a rule- 
based or model-based filter, which highly depends on the performance of entity extraction. Entity 
extraction, having high precision and high recall, results in good relationships and attributes extraction. 
However, a method for achieving a good relationship and attribute extraction when there is 
considerable noise from entities is worthwhile to be developed. 

e This evaluation did not consider the use of an end-to-end model for the joint extraction of entities 
and relationships. A possible reason is that there are too many types of entities and relationships, but 
few train corpora. Therefore, developing an end-to-end model in this situation is challenging. 

e Studies on extending the knowledge graph schema, such as 50 entity types, hundreds of entity 
attributes, and relationships between entities should be performed. 

e Further research on automatic construction of multilingual knowledge graphs should be conducted. 
The evaluation in this paper did not take multilingualism into account, since our goal is a financial 
research report knowledge graph (FR2KG) in the Chinese langauage. In particular, the FR2KG data 
set did not involve the fusion of entities among multiple languages. Constructing knowledge graph 
from a multilingual corpus involves many new topics, including entity alignment and extraction 
relationships between different language entities. Simultaneously, it would be meaningful to evaluate 
the automatic construction of multilingual knowledge graphs. 

e This evaluation implies the disambiguation and fusion of a small number of entities. There is no 
explicit evaluation of this area. In this regard, the evaluation of knowledge disambiguation and fusion 
will be more and more active in the future. 

e It is a significant topic to study the difficulty of entity extraction, attribute extraction and relationship 
extraction in detail. In addition, it is also valuable and meaningful to set reasonable metrics for the 
automatic construction of knowledge graph. 


7. CONCLUSIONS 


In this paper, we introduce a high-quality data set, named financial research report knowledge graph 
(FR2KG), which consists of 17,799 entities, 26,798 relationship triples, and 1,328 attribute triples covering 
10 entity types, 19 relationship types, and 6 attributes. We present an overview of the evaluation task of 
automated construction of Financial Knowledge Graph at CCKS2020. In addition, we summarized the 
technologies for automatically constructing knowledge graphs, and introduced some challenging topics and 
new directions for research in knowledge graphs. 
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